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WHAT IS CLAIMED IS: 

1, A document segmentation apparatus comprising: 
M' [ \ table analyzing means for generating cell 
positioA data indicating a positional relationship 
between Aells and cell vectors representing 
characteristics of the cells, by analyzing a table in 
a document\to be processed; 

>le type judging means for judging a table 
type with reference to the cell position data and the 
cell vectors\generated by said table analyzing means; 

first segment generating means for generating 
a segment from\ the table when the table type is a 
table for showing a table; and 

second \segment generating means for 
generating a segment from the table when the table 
type is a table ftor layout. 

2. A documen A segmentation apparatus according 
to claim 1, wherein \s aid first segment generating 

means comprise; 

cut directioX determination means for 
determining a cut direction of the table by judging 
whether the data is expressed in a column or a row in 
the table on the basis o\ the cell position data and 
the cell vectors; and 

table segment generating means for generating 
a table segment by dividing\the table on the basis of 
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the table type and, the cut direction. 



3 . A document segmentation apparatus according 
to claim 2, wherein said second segment generating 
means generate the table itself as the segment. 



4. A document segmentation apparatus according 

to claim 1, wherein\said second segment generating 

means comprise; \ 
10 cell cluster\generating means for generating 

cell cluster information by clustering the cells in 

the table; and \ 

layout segment generating means for 

generating segment by connecting the cells in the 
15 table with reference to th\ cell position data and the 

cell cluster information. 



20 



5. A document segmentation apparatus according 
to claim 4, wherein said first segment generating 
means generate the table itself as the segment. 



25 



6 . A document segmentation apparatus according 
to claim 4, wherein said second segment generating 
means generate the table itself as tne segment. 



7. A document segmentation apparatus according 
to claim 1, further comprising normal segment 
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generating means for dividing the document into a 
segment which! corresponds to one table; 
and wherein \ 

the table generated as one segment by said 
5 normal segment generating means is to be processed by 
said table analyzing means. 

8 . A document segmentation apparatus according 
to claim 1, whereimsaid table analyzing means further 

10 generate cell data of the analyzed table and said 
table type judging means judge the table type with 
reference to the cell data. 

9. A document segmentation apparatus according 
15 to claim 8, wherein said table type judging means 

comprise similarity judging Vneans for judging the 
table type on the basis of similarity between the cell 
data positioned at particular V>ositions with reference 
to the cell position data and the cell data generated 
20 by said table analyzing means. \ 

10. A document segmentation \apparatus according 
to claim 8, wherein said table type judging means 
comprise partial character line extracting means for 

25 extracting partial character lines fnom the cell data 
positioned at a particular position wAth reference to 
the cell position data and the cell daAa generated by 
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said tajDle analyzing means, and character line 
comparing means for comparing the extracted partial 
character lines to judge the table type. 

5 11. A document segmentation apparatus according 

to claim 8\ wherein said table type judging means 
comprise partial character line extracting means for 
extracting partial character lines from the cell data 
positioned at V a particular position with reference to 
10 the cell position data and the cell data generated by 
said table analyzing means, and similarity judging 
means for judging the table type on the basis of 
similarity between the extracted partial character 



15 



lines , 



12. A document segmentation apparatus according 
to claim 8, wherein said table type judging means 
comprise syntax judging meaxis for judging the table 
type with reference to the cell position data, the 

20 cell vectors and the cell dalra generated by said table 
analyzing means, and similarity judging means for 
judging the table type on the ^asis of similarity 
between the cell data positioned^ at particular 
positions with reference to the oell position data and 

25 the cell data generated by said table analyzing means. 



13. A document segmentation apparatus according 
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to claim 8, wherein said table type judging means 
comprise syntax judging means for judging the table 
type with reference to the cell position data, the 
cell vectors and\ the cell data generated by said table 
5 analyzing means, partial character line extracting 

means for extracting partial character lines from the 
cell data positioned at a particular position with 
reference to the cel\^ position data and the cell data 
generated by said table analyzing means, and character 
10 line comparing means for comparing the extracted 
partial character lines\ to judge the table type. 

\ 

14. A document segmentation apparatus according 
to claim 8, wherein said table type judging means 

15 comprise syntax judging means for judging the table 
type with reference to the cell position data, the 
cell vectors and the cell data^generated by said table 
analyzing means, partial character line extracting 
means for extracting partial character lines from the 

20 cell data positioned at a particular position with 

reference to the cell position data and the cell data 
generated by said table analyzing n^eans, and 
similarity judging means for judging\ the table type on 
the basis of similarity between the extracted partial 

25 character lines. 



15. A document segmentation apparatus according 
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to claim 1, further comprising table reforming means 
for reforming the table so that the number of cells in 
each column and each row becomes the same, by 
analyzing the table to be processed; 
and wherein 

said table analyzing means analyze the 
reformed ^table. 



16, A\ document segmentation apparatus according 
to claim 15, vwherein said table reforming means 
comprise supplementary data removing means for 
removing data added to the table from the table data. 



17. A document segmentation apparatus according 
to claim 15, wherein said table reforming means 
comprise multi-row/multi-column processing means for 
reforming the table regularly by analyzing the 
structure of the table\fiata. 



18. A document segmentation apparatus according 
to claim 15, wherein said table reforming means 
comprise composite table processing means for 
reforming the table by analyzing regularity of 
information description constituting the table. 

19. A document segmentation). apparatus according 
to claim 15, wherein said table reforming means 
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comprise; 

supplementary data removing means for 
removing data added to the table from the table data; 
and ^ 

^ multi-row/multi-column processing means for 
reforming the table regularly by analyzing the 
structural of the table data. 



20. AVdocument segmentation apparatus according 
10 to claim 15, ^wherein said table reforming means 
comprise; 

supplementary data removing means for 
removing data addled to the table from the table data; 
and 

15 composite \table processing means for 

reforming the table by analyzing regularity of 
information description constituting the table. 



21. A document segmentation apparatus according 
20 to claim 15, wherein said table reforming means 
comprise; 

multi-row/multi-column processing means for 
reforming the table regularly \sy analyzing the 
structure of the table data; and^ 
25 composite table processing means for 

reforming the table by analyzing regularity of 
information description constituting^ the table. 
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22. A document segmentation apparatus according 
to claim 15, wherein said table reforming means 
comprise: 

supplementary data removing means for 
5 removing data added to the table from the table data; 

multi-row/multi-column processing means for 
reforming the table regularly by analyzing the 
structure of the table data; and 

composite table processing means for 
10 reforming the tattle by analyzing regularity of 
information description constituting the table. 



23. A document 'segmentation method comprising: 
a table analyzing step for generating cell 
15 position data indicating^ a positional relationship 
between cells and cell vectors representing 
characteristics of the cel\Ls, by analyzing a table in 
a document to be processed \^ 

a table type judging step for judging a 
20 table type with reference to the cell position data 

and the cell vectors generated by said table analyzing 
step; 

a first segment generatihg step for 
generating a segment from the table V/^ en the table 
25 type is a table describing a table; ai 

a second segment generating ^tep for 
generating a segment from the table whenVthe table 
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type is a table for layout. 

\ 

24. a\ document segmentation method according to 
claim 23, wherein said first segment generaitng step 
5 comprises: \ 

a cut direction determination step for 
determining a cut direction of the table by judging 
whether the data is expressed in a column or a row in 
the table on the\basis of the cell position data and 
10 the cell vectors ;\ and 

a table segment generating step for 
generating a table \egment by dividing the table on 
the basis of the table type and the cut direction. 



15 25. A document segmentation method according to 

claim 24, wherein said second segment generating step 
generates the table itself as the segment. 



26. A document segmentation method according to 
20 claim 23, wherein said second segment generating step 
comprises; 

a cell cluster generating step for 
generating cell cluster information by clustering the 
cells in the table; and 
25 a layout segment generating step for 

generating segment by connecting rhe cells in the 
table with reference to the cell position data and the 
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cell cluster information. 

\ 

\ 

27. A document segmentation method according to 
claim 26, wherein said first segment generating step 
generates the table itself as the segment. 

28. A document segmentation method according to 
claim 26, wherein said second segment generating step 
generates the table itself as the segment. 



29. A document segmentation method according to 
claim 23, further comprising a normal segment 
generating step for dividing the document into a 
segment which corresponds to one table; 

15 and wherein \ 

\ 

the table generated as one segment by said 
normal segment generating\ step is to be processed by 
said table analyzing step. 



20 30. A document segmentation method according to 

claim 23, wherein said table analyzing step further 
generates cell data of the analyzed table and said 
table type judging step judges x^\& table type with 
reference to the cell data. 



31. A document segmentation method according to 
claim 30, wherein said table type judging step 
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comprises a similarity judging step for judging the 
table type on the basis of similarity between the cell 
data positioned at particular positions with reference 
to the cell position data and the cell data generated 
5 by said table analyzing step. 

32. A document segmentation method according to 

claim 30, wherein said table type judging step 

\ 

comprises a partial character line extracting step for 
10 extracting partial\character lines from the cell data 
positioned at a particular position with reference to 

the cell position data and the cell data generated by 

\ 
\ 

said table analyzing step, and a character line 
comparing step for comparing the extracted partial 
15 character lines to judge the table type. 

33 . A document segmentation method according to 
claim 30, wherein said table type judging step 
comprises a partial character line extracting means 

20 for extracting partial character lines from the cell 
data positioned at a particular position with 
reference to the cell position data and the cell data 
generated by said table analy^ng step, and a 
similarity judging step for judging the table type on 

25 the basis of similarity between \the extracted partial 
character lines. 
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34. A document: segmentation method according to 
claim 30, wherein said table type judging step 
comprises a syntax judging step for judging the table 
type with reference t*o the cell position data, the 
5 cell vectors and the cell data generated by said table 
analyzing step, and a ^similarity judging step for 
judging the table type Von the basis of similarity 
between the cell data positioned at particular 
positions with referencA to the cell position data and 
10 the cell data generated by said table analyzing step. 



35. A document segmentation method according to 
claim 30, wherein said table type judging step 
comprises a syntax judging \step for judging the table 

15 type with reference to the cell position data, the 

cell vectors and the cell data generated by said table 
analyzing step, a partial character line extracting 
step for extracting partial character lines from the 
cell data positioned at a particular position with 

20 reference to the cell position data and the cell data 
generated by said table analyzing step, and a 
character line comparing step for comparing the 
extracted partial character linens to judge the table 
type. 



25 



36. A document segmentation method according to 
claim 30, wherein said table type judging step 
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comprises a syntax fudging step for judging the table 
type with reference Jto the cell position data, the 



cell vectors and the 



cell data generated by said table 



analyzing step, a partial character line extracting 
step for extracting partial character lines from the 

|at a particular position with 
position data and the cell data 



cell data positioned 
reference to the cell 



generated by said table analyzing step, and a 
similarity judging means for judging the table type on 
10 the basis of similarity between the extracted partial 
character lines. 

37. A document segmentation method according to 
claim 23, further comprising a table reforming step 

15 for reforming the table so that the number of cells in 
each column and each row ^becomes the same, by 
analyzing the table to be\ processed; 
and wherein 

said table analyzing step analyzes the 

20 reformed table. 

38. A document segmentation method according to 
claim 37, wherein said table Preforming step comprises 
a supplementary data removing \step for removing data 

25 added to the table from the table data. 

39. A document segmentation method according to 
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claim 37, wherein said table reforming step comprises 

\ 

a multi-row/multi-rcolumn processing step for reforming 
the table regularly by analyzing the structure of the 
table data. 

5 

40. A document! segmentation method according to 
claim 37, wherein sard table reforming step comprises 
a composite table processing step for reforming the 
table by analyzing regularity of information 

10 description constituting the table. 

41. A document segmentation method according to 
claim 37, wherein said tabGLe reforming step comprises; 

a supplementary data removing step for 
15 removing data added to the ^able from the table data; 
and 

a multi-row/multi-dplumn processing step for 
reforming the table regularly \by analyzing the 
structure of the table data. 

20 

42. A document segmentatioVi method according to 
claim 37, wherein said table reforming step comprises; 

a supplementary data removing step for 
removing data added to the table fr^m the table data; 
25 and 

a composite table processing step for 
reforming the table by analyzing regularity of 
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information description constituting the table. 

43. A dobument segmentation method according to 
claim 37, wherein said table reforming step comprises; 

a multi-row/multi-coluran processing step for 
reforming the tsflble regularly by analyzing the 
structure of theytable data; and 

a composite table processing step for 
reforming the tablfe by analyzing regularity of 
information description constituting the table. 

44. A document segmentation method according to 
claim 37, wherein said table reforming step comprises; 

a supplementary data removing step for 
removing data added to the table from the table data; 

a multi-row/multi-column processing step for 
reforming the table regularly by analyzing the 
structure of the table data; and 

a composite table processing step for 
reforming the table by analyzing regularity of 
information description constituting the table. 

45. A computer-readable storage medium storing a 
document segmentation program for controlling a 
computer to perform document segmentation, said 
program comprising codes for causing the computer to 
perform: V 
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a tabl^e analyzing step for generating cell 
position data inLicating a positional relationship 
between cells and cell vectors representing 
characteristics o\f the cells, by analyzing a table in 
5 a document to be processed ; 

a table \type judging step for judging a 
table type with reference to the cell position data 
and the cell vectors generated by said table analyzing 
step; 

10 a first sdgment generating step for 

generating a segment! from the table when the table 
type is a table describing a table; and 

a second segment generating step for 
generating a segment from the table when the table 

15 type is a table for layout. 



